LOCALIZATION AND DELOCALIZATION FOR HEAVY TAILED 
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Abstract. We consider some random band matrices with band-width whose entries 
are independent random variables with distribution tail in x~ a . We consider the largest 
eigenvalues and the associated eigenvectors and prove the following phase transition. On 
the one hand, when a < 2(1 + the largest eigenvalues have order , are asymp- 

totically distributed as a Poisson process and their associated eigenvectors are essentially 
carried by two coordinates (this phenomenon has already been remarked by Soshnikov in 
[26l [27] for full matrices with heavy tailed entries, i.e. when a < 2, and by Auffinger et 
al in [I] when a < 4). On the other hand, when a > 2(1 + /i -1 ), the largest eigenvalues 
have order and most eigenvectors of the matrix are delocalized, i.e. approximately 
uniformly distributed on their N coordinates. 



Introduction 

Recently some growing interest has been laid on the understanding of the asymptotic 
behavior of both eigenvalues and eigenvectors of random matrices in the large size limit. 
For Wigner random matrices, that is N x N Hermitian or real symmetric random matrices 
with i.i.d. entries (modulo the symmetry assumption), the large- Y-asymptotic behavior 
is now well understood, provided the distribution of the entries has sub-exponential decay 
(or at least a large enough number of moments). It is indeed known from the works of 
Erdos-Schlein-Yau, Tao-Vu and Knowles-Yin ( [121 H31 HH Ell [301 E] an d see also references 
therein) that : 

- eigenvalues are very close to their theoretical prediction given by well-chosen quantiles 
of the semi-circle distribution (the proof is based on a strong semi-circle law). This also 
yields universality of local statistics in the bulk and at the edge under some appropriate 
moment assumptions (see Erdos [5] e.g. for a review of the recent results). 

- eigenvectors are fully delocalized in the following sense. The localization length, L, of 
an eigenvector v is the typical number of coordinates bearing most of its £ 2 norm. Then 
it is proved that with very "high probability" there does not exist an eigenvector with 



Date: October 30, 2012. 

2000 Mathematics Subject Classification. 15A52;60F05. 

Key words and phrases. Random matrices, band matrices, heavy tailed random variables. 
This work was partly accomplished during the first named author's stay at New York University Abu 
Dhabi, Abu Dhabi (U.A.E.). 

1 



2 



FLORENT BENAYCH-GEORGES AND SANDRINE PECHE 



localization length L <C N. Or roughly speaking all coordinates are in the order of N 1 ^ 2 . 

In this article, we want to fill in the gap of understanding the role of moments in the 
derealization properties of eigenvectors. We will be interested in a model of random 
matrices that we believe to be quite rich, namely random band matrices with heavy-tailed 
entries. 

More precisely, the matrices under consideration in this paper are Hermitian random 
matrices with at most iV M non zero entries per row. In other words, we force some of the 
entries of a Wigner matrix to be zero. This model is believed to be more complicated than 
Wigner ensembles due to the fact that there is no reference ensemble: there does not exist 
a "simple" band random matrix ensemble for which eigenvalue/eigenvector statistics can 
be explicitly computed as for the GUE/GOE in Wigner matrices. Thus usual comparison 
methods (four moments theorem, Green function comparison method) cannot be used 
directly in this setting. 

Such a model is also believed to exhibit a phase transition, depending on \i. On a physical 
level of rigor, Fyodorov and Mirlin [16] e.g. have explained that for Gaussian entries, the 
localization length of a typical eigenvector in the bulk of the spectrum shall be of order 
L = 0(N 2fJi ) so that eigenvectors should be localized (resp. delocalized or extended) if 
\l < 1/2 (resp. > 1/2). The only rigorous result in the direction of localization is by 
Schenker (52]. Therein it is proved that L <C N 8fl for all eigenvectors of random band 
matrices with i.i.d. Gaussian entries on the band. On the other hand, derealization in 
the bulk is proved by Erdos, Knowles, Yau and Yin [11] when \i > 4/5. In both regimes, 
it is known from Erdos and Knowles [HI EI] that typically L > N 7 ^/ 6 for a certain class of 
random band matrices (with sub-exponential tails and symmetric distribution). We refer 
the reader to Spencer [28] and Erdos, Schlein and Yau [13] for a more detailed discussion on 
the localized/delocalized regime. Regarding the edges of the spectrum, much less is known 
about the typical localization length of the associated eigenvectors. The authors are not 
aware of a proof that eigenvectors at the edge are fully delocalized. However, Sodin's 
statement [21] combined with Erdos-Knowles-Yau-Yin'results [11] suggest that this should 
be true when /x > 5/6. 

We will also allow the i.i.d. non zero entries to admit only a finite number of moments 
(which can actually be zero). Allowing heavy-tailed entries allows some more localization, 
especially at the edge of the spectrum, as we can infer from Wigner matrices. This is 
discussed in particular in the seminal paper by Cizeau and Bouchaud [?]. It is known 
that the limiting spectral measure of such Wigner matrices is the semi-circle distribution 
provided that the variance of the entries is finite (otherwise another limiting distribution 
has been identified by Guionnet and Ben Arous [5]). Regarding eigenvectors, it was shown 
by Soshnikov [26l [27] and Auffinger, Ben Arous and Peche p] that eigenvectors associated 
to the largest eigenvalues have a localization length of order 1 if the entries do not admit a 
finite fourth moment. The localization length is not so clear in the bulk but some progress 
has been obtained by Bordenave and Guionnet [6]. However it is commonly believed that 
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the fourth moment shall be a threshold for the localization of eigenvectors at the edge of 
the spectrum of full Wigner matrices. For band matrices when the bandwidth is negligible 
w.r.t. the size N of the matrix, no such threshold has been intuited. This is also a gap we 
intend to fill in here. 

Specifically, we prove the following phase transition. On the one hand, when a < 
2(l+/i _1 ), the largest eigenvalues and associated eigenvectors are determined by the largest 
entries. Largest eigenvalues are in the order of iV - ^ and asymptotically distributed as a 
Poisson process while the associated eigenvectors are essentially carried by two coordi- 
nates. This phenomenon has already been noted by Soshnikov in [26, 27J for full matrices 
(p = 1) when a < 2, and by Aufnnger et al in [1J when a < 4. On the other hand, when 
a > 2(1 + p~ x ), so that <C jV 74 / 2 , largest entries no longer play a role. Then the 

largest eigenvalues have order Nz and most eigenvectors of the matrix are delocalized, i.e. 
approximately uniformly distributed on their N coordinates. 

The paper is organized as follows. In Section [TJ we state our two main theorems : The- 
orem 11.11 is the localization result mentioned above about the extreme eigenvalues and 
eigenvectors in the case a < 2(1 + p~ l ) and Theorem 11.51 is the derealization result men- 
tioned above about the extreme eigenvalues of the matrix and most of its eigenvectors in 
the case a > 2(1 + ft" 1 ). Sections [2], [3] and H] are devoted to the proofs of these results 
and the appendix is devoted to the proof of several technical results, including Theorem 
15. 3[ a general result whose idea goes back to papers of Soshnikov about the surprising 
phenomenon that certain Hermitian matrices have approximately equal largest eigenvalues 
and largest entries. 

Notation. For any functions (or sequences) /, g, we write f(x) ~ g(x) (resp. f(x) ~ g(x)) 
when f(x)fg{x) — > 1 (resp. f(x)/g(x) is slowly varying) as x — > +oo. We denote by ||v|| 
the £ 2 -norm of v G and by \\A\\ the £ 2 — > £ 2 operator norm of a matrix A. When A is 
normal, then \\A\\ = p{A) where p(A) is the spectral radius of A and we use equivalently 
both notations. 

An event A^ depending on a parameter is said to hold with exponentially high prob- 
ability (abbreviated w.e.h.p. in the sequel) if F(A^) > 1 — e~ CN for some C, 9 > 0. 



1. The results 

Let us fix two exponents a > and p G (0, 1] and let, for each N, A N = [ajj]^- =1 be a 
real symmetric (or complex Hermitian) random matrix such that for all i's in {1, . . . , A^} 
except possibly o(N) of them, 



(1) 



{j ; ciij is not almost surely zero} ~ A^, 
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whose non almost surely null entries are i.i.d. modulo the symmetry assumption and such 
that for a certain a > 0, 

(2) G(x) := P(|Oy| > x) ~ x~ a as x ->■ oo. 

If a > we also suppose that the symmetrically distributed. This symmetry 

assumption simplifies the exposition of arguments but can be relaxed (we briefly indicate 
this possible extension in Remark [275] below). Note that for all fixed dij might depend 
on N (think for example of the case where An is a band matrix) , hence should be denoted 
by dij(N). However, we suppose that the estimate ([2]) is uniform in N and that if a > 
2(1 + yU -1 ), the second moment of the non identically zero entries of An is equal to one. 

The standard example of matrices satisfying ([1]) is given by band matrices, i.e. matrices 
with entries a^- such that = when \i — j\ > Another very close example is 

the one of cyclic band matrices, i.e. matrices with entries such that = when 
\i- j\ > and \i - j\ > N-N^/2. 

We denote by Ai > A2 > • • • the eigenvalues of An (they depend implicitly on N) and 
we choose some unit associated eigenvectors 

Vi, v 2 , . . . 

Let us also introduce a set of pairs of indices {i\ < ji), {i% < jz), ■ ■ ■ such that for all k, \ai k j k \ 
is the kth largest entry, in absolute value, of An- Let 6k £ K. such that ai k j k = \a ik j k \e 2l0k . 
The eigenvectors v 1; v 2 , . . . are chosen such that for each k, 

e- ie *(v k ,e ik ) >0, 

with ei, . . . , ejy the vectors of the canonical basis. 

As we shall see in the two following theorems, the asymptotic behavior of both the 
largest eigenvalues of An and their associated eigenvectors exhibit a phase transition with 
threshold a = 2(1 + /i" 1 ). 

Theorem 1.1 (Subcritical case). Let us suppose that a < 2(1 + Then for each fixed 

k > 1, we have the convergences in probability, as N — > oo ; 

(3) j^-r — ► 1 

\ a ikjk I 

and 

(4) v fe - (e t6k e ik + e~ idk e jk ) — ► (for the I 2 -norm). 

As a consequence of forbN the sequence defined by §5§ below, the random point process 

k; |a ifejfc |>0 

converges in law to the law of a Poisson point process on (0, +00) with intensity measure 
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The sequence b^ of the theorem is defined by 

(5) b N := infix >0; G(x) < — r — ; : r^T- r 

[ #{non identically zero independent entries of An\ 

where G(x) is defined by (El) - It can easily be deduced from (JTJ and (jSJ) that 

(6) bN-N 1 -^. 

Roughly speaking, this theorem says that when a < 2(1 + /x -1 ), the largest eigenvalues of 
An have order N~^~ , but no fixed limit when divided by N <» , because the limiting object 
is a Poisson process. Moreover, the corresponding eigenvectors are essentially supported 
by two components. As we shall see in the following theorem, the case a > 2(1 + p~ r ) is 
deeply different: in this case, the largest eigenvalues of have order Ni and tend to 2 
when divided by , whereas the eigenvectors are much more delocalized, i.e. supported 
by a large number of components. 

To be more precise, we use the following Definition 7.1 from Erdos, Schlein and Yau [13]. 

Definition 1.2. Let L be a positive integer and rj > be given. A unit vector v = 
(vi, . . . ,vn) G is said to be (L, ^-localized if there exists a set S C {1, . . . ,N} such 
that \S\ = L and J2jes c \ v i\ 2 — r l- 

We shall also use the following slightly modified version of the above definition. 

Definition 1.3. Let L be a positive integer and 77 > be given. A unit vector v = 
(vi, . . . ,vn) G is said to be (L, ^-successively localized if there exists a set S which 
is an interval of the set {1, . . . ,N} endowed with the cyclic order such that \S\ = L and 

Remark 1.4. The larger L and rj, the stronger a statement of the type "There is non 
(L, t]) -localized eigenvector" is. 

Theorem 1.5 (Supercritical case). Let us suppose that a > 2(1 + /i -1 ). Then for each 
fixed k > 1, as N — >• oo ; we have the convergence in probability 

(7) 4 -► 2. 

JS/2 

Moreover, for L := IN C \, with c such that 

2 a-2 . . 

(8) c<-fjL (resp. c<a), 

5 a — 1 

for any r) < 1/2, we have, as N —¥ 00, 

(9) P ^U r?<r?0 {3A;, |Afe| > y/2rjp(A) and v fc is (L,rj) -localized}^ — > 0. 

Remark 1.6. Note that this theorem does not only apply to the edges of the spectrum, 
as rj runs from to 770 in t$\j. 
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Remark 1.7. Note that focusing on successively localized vectors, we would need to 
improve the bound c < \i in order to get some flavor of the usual threshold of the so-called 
Anderson transition. The localization length L of typical eigenvectors in the bulk is indeed 
supposed numerically to be in the order of L f=s N 2 ^ when \x < 1/2 for entries with many 
moments. At the edge of the spectrum, the authors are not aware of any intuited (even at 
a physical level of rigor) localization length in the localized regime. 

To prove both above theorems, we shall also use the following result, which had not 
appeared at this level of generality yet. 

Theorem 1.8. We suppose that the hypotheses ([T]) and (J2J) hold with a > 2 and that the 
first and second moments of the non identically zero entries of An are respectively equal 
to and 1. Then the empirical spectral measure of A N /N^ converges almost surely to the 
semi-circle law with support [—2,2]. 

Proof. The proof relies on a classical cutt-off and moments method, copying the proof 
of the convergence to the semi-circle distribution for standard Wigner matrices (see for 
example [21 Th. 2.5]). □ 

2. A PRELIMINARY RESULT: GENERAL UPPER-BOUND ON THE MOMENTS 

Theorem 2.1. Consider some positive exponents 7, 7', 7" such that 

(10) |< 7 ' and ^ + 7 + 7" <i 

and define the truncated matrix A N = [%l| aij |<iV->']ij=i- Then for sjy < N 1 " , there exists 
a slowly varying function L such that 

E[Tr(i^)] < L(N)N 1+2 ^s n 3/2 (2N^') 2sn . 

The following corollary follows directly from the theorem and from the Chebichev in- 
equality. 

Corollary 2.2. For any k < 1 (possibly depending on N), we have, up to a polynomial 
factor in the RHT, 

(11) P (\\A N \\ > k x 2N^ < k~ 2sn . 

Remark 2.3. Roughly speaking, this theorem says that for any e > 0, 

\\A N \\ < (2 + e )iV max{ t<f +^ for iV > 1. 

Remark 2.4. Note that for the theorem and its corollary to be true, one does not really 
need the size of the matrix to be N, but just to be not more than a fixed power of N. This 
remark will allow us to apply the estimate (II ip to submatrices of A^. 
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Proof of the theorem. Our strategy will be to use the ideas of Soshnikov, well explained in 
[23] (see also |25j or [TJ). We shall also need an estimate on the moments of the truncated 
variables := aijt\ ai .\<isr<- By |15j . Chap. VIII. 9, Th. 2.23, we have that for any k > 0, 
for any (non identically null) a,ij, 

(12) E ^^{iV7(^) ifJfe>a . 

We have, suppressing the dependence on N to simplify the notation, 



Tri 2s = 



(^ioii ' ' ' Oii 



l<io r ..,i 2s <N 

io=*2s 

To any i = (i ,...,i 2s ) such that io = i2 S , we associate the non oriented graph Gi := 
(Vi,£"i) with vertex set {io, . . . ,i2 S } and edges {ie-i,ie}, 1 < t < 2s and the closed path 
-Pi = io — > i\ — > ■ ■ ■ — > iis on this graph. 

Since the symmetrically distributed, each edge of G\ has to be visited an even 

number of times by Pi for the contribution of i to E Tr A 2s to be nonzero. 

To such a i, we associate a set Mi of s marked instants as follows. We read the edges of 
Pi successively. The instant at which an edge {i, j} is read is then said to be marked if up 
to that moment (inclusive) the edge {i,j} was read an odd number of times (note that the 
instant are counted from 1 to 2s, hence the instant where, for example, the edge io — > %\ 
is read is the instant 1). Other instants are said to be unmarked. Since each edge of Gi is 
visited an even number of times by Pi, it is clear that 

#M ; = 8. 

Now, for each < k < s, we define Ni(h) to be the set of i's in {1, . . . , N} occurring 
exactly k times as the current vertex of a marked instant and be its cardinality. Let 
the family (no, . . . , n s ) be called the type of i. Note that we have 

s s 

(13) S ^, n k = N and kn^ = s. 

k=0 k=0 

Let us now count the number of i's with fixed type (no, . . . ,n s ) (where the n^'s satisfy 
()13p ). To define such a i, one first has to choose the set Mi of marked instants : there are 
as many possibilities as Dick paths, i.e. the Catalan number 



2s 



1 V s 



Then one has to choose an unlabelled partition of Mi defined by the fact that two marked 
instants are in the same class if and only if the path Pi is at the same vertex of Gi at both 
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of these instants. Such a partition is only required to have nk blocks of cardinality k for 
each k = 1, . . . , s. Hence there are 

s\ 1 

m=i(*o n * x rou^! 

possibilities (the first factor counting the labelled partitions and the second one "dela- 
belling"). At this point, one has to choose the vertices of G{. For io, there are N pos- 
sibilities. For each other vertex, there are at most W possibilities. There are at most 
n\ + ■ • • + n s other vertices. Indeed, except possibly io, each vertex is occurring a certain 
number of times as the current vertex of a marked instant (for example at the first time the 
vertex is visited by the path P;). Hence there are at most Nd^ A hns possibilities for the 
choices of the vertices of G{. There now remains to give an upper-bound on the number of 
ways to determine vertices at unmarked instants (such vertices will not be new, but still 
have to be chosen among the before chosen vertices). Soshnikov proved in [25J that this 
number is not larger than P3^. =2 (2A;) fcriA: (the idea is that the number of ways to determine 
the endpoint of an edge starting from a vertex of type k at an unmarked instant is at most 
2k). 

To sum up, the number of i's with fixed type (no, . . . , n s ) is at most 

c -m^ x x NdS ^ x W kfnK 

Let us now give an upper bound on the expectation Efa^ ■ • • a i2s _ li2s ] depending on the 
type of i. For i, j e V\, let ij denote the edge {i, j} of Gi (this edge is unoriented, so 
ij = ji) an d let k(ij) denote the half of number of times that this edge is visited by Pi, i.e. 
the number of marked instants along edge ij. We also introduce k^jj to be the number of 
times that the vertex % is marked along the edge ij. Clearly, 

k( J~i) = Km + k j-jj and type(i) = k i-JJ- 

j 

We know, by (fl2|) . that for a certain slowly varying sequence L(n) (that can change at 
every line) 

E[a i0il ---a i2s _ li2s ] = L(n) J] N^ k ^<L{n) J] N^ 2k ^ = L(n)iV" 2 ^ J] N 2 ^ k % 

k(e)>a k(e)>2 k(e)>2 
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where E denotes the number of edges e such that k(e) > 2. Let us now enumerate the 
edges via their extremities. Then 

k(e) = ^{marked instants along edges e such that k(e) > 2} 

ee-Bi, 
k{e)>2 

^ ^ kvyvw ^ ^ kv;vw ^ k v - vw . 

(v^ev? (v,w)&v? (v,w)ev. 2 

k(vw)>2 k(vw)>2,type(v)=l k(vw)>2,type(v)>2 

Let us now use the fact, well known from [25j [201 H] that if an edge WW is visited at least 
4 times by the path Pi, then at least one of v and w have type > 2, except for the first 
visited vertex i . It follows that the first sum above is < E + 1. Hence 



: ;vw 



fc=2 k n k 



eeSi, ?;eVi,type(u)>2 w 

k(e)>2 

s 

= E + l + kn k 

k=2 

Hence E[a loh ■ ■ ■ a l2s _ lhs ] < L(N)N 2 ^ 1+ ^ kn »\ 
As a consequence, 

E[Tt(A 2s )}/N 2s ^' < 

L{N)N 1+2 ^'C s s\ rr* x rrs 1 r * x n(2fc) fc "*xiV 2 ^ 

s. t. {T3)l holds 

Let us now use the fact s! < n^.s^' 711 ^ = rii\s^=2 kn k < ni !j\r7"Efc=2 fcn * ) d N < iV M and 
jy-2s 7 ' _ ]y-27'Efc=i^. We get 

E[Tr(i 2s )]/iV 2s7 ' < 

ni,...,n s k = 2 k ' \ ' / 

s. t. Q3J| holds 

But by the hypothesis (fTOj) . /x — 27' < 0, hence the first factor is < 1, so, using the fact 
that by ( fT3|) .ni is determined by the other n/s, we get 

1 ( N^{2kN 2 ^'^") k \ nk 



E[Tt(A 2s )}/N 2 ^' < L(N)N 1+2 "<C S J2 Il^li ^^kl 7 ^ )fc ) 

n 2 ,...,n s >0k=2 nk ' ^ ' ' 

n 2 ,...,n s >0fc=2 K v 7 



< L(N)N^C s exp(J2 ^ kl j 



,jfc=2 



10 



FLORENT BENAYCH-GEORGES AND SANDRINE PECHE 



To conclude, 

E[Tr(A 2s )]/N 2s ^ < L(N)N 1+2 ^C s ex V £ 1 >- 

\k=2 

By the hypothesis made at Equation ffTUj) . we have | + 7 + 7" < 7', so that the exponential 
term stays bounded asiV-^ 00. Using that C s ~ 4 s (7rs 3 ) _1//2 , we get Theorem 12. 11 □ 



Remark 2.5. In the case where the entries a^-, 1 < z,j < iV are not symmetrically 
distributed, one can prove a similar statement as in Theorem 12.11 The proof is based 
on arguments already given in Section 4 of pQ and [20]. One can indeed assume that the 
truncated entries are centered. Then, the main modification in evaluating E[Tr(v4 2s )]/iV 2s7 ' 
is that one has to take into account the contribution of paths with edges seen an odd number 
of times. However any such edge is seen at least 3 times, because the entries are centered. 
It can then be shown that the contribution of such paths is negligible (provided s is small 
enough as in Theorem 12. lj) , as to each such edge corresponds a vertex of type > 1 . 



3. Proof of Theorem 11.11 

We are going to prove Theorem 11.11 as an application of Theorem 15.31 of the appendix, 
for c n := 6at, the sequence defined by More precisely, we shall use its "random 

versions": the case a < 1 + /i -1 will be a consequence of Corollary 15 .4} whereas in the case 
l + /i _1 < a < 2(1 + /i^ 1 ), we need to truncate the entries, and the conclusion will follow 
from Corollary 15.51 

Hypothesis 02]) implies that the distribution of the non-zero entries is in the max-domain 
of attraction of the Frechet distribution with exponent a (see [21], p. 54). By e.g. [T§] Th. 
2.3.1], it implies that as N — > 00, the point process 

fc;KiJ>o 

converges in distribution to a Poisson point process on (0, +00) with intensity measure 
■^fjdx. It explains why the second part of Theorem 11.11 is a consequence of its first part 
and why Hypothesis f l25|) of the corollaries 15.41 and 15.51 is satisfied by the ja^-J's. 

Note that by (]2]), for any 9 > 0, for any non indenticaly null ajj, 
(14) P(|a fi | > 6^) ~ b~ N ae ~ N~ e ^\ 

The following claim (valid without any assumption on a) is a direct consequence of (fl4|) 
and of the union bound. 

Claim 3.1. For any 77 > 0, with probability going to one as N — > 00, we have: 
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a) no row of An has two entries larger, in absolute value, than o^ 1+M) ' (the exponent 
2(i".j^) increases from ~ to | as increases from to 1), 

b) t/ie matrix An has no diagonal entry larger, in absolute value, than 

So for any positive S, S ; , parts (b.i) and (b.ii) of the random version of Hypothesis 15.21 
are satisfied with 

(15) K := ) + 2/i , +5, r := - J— + 5'. 
V ; 2(1 + //) l + /i 

Let us now verify Part (b.iii). 

3.1. Case where a < 1 + fjT 1 . Set 

(16) 5 := Y \ a ij\ 

rA a ij\< b % 

(we suppress the dependence in N to simplify notation). We shall prove that there is v < 1 
such that P(Sat > b N ) is exponentially small with some bounds that are uniform on i. The 
sum S can be rewritten S = Si + S 2 + S 3 as follows : 

The sums Si, S 2 , S 3 can be treated with respectively parts a), c) and d) of Proposition l5.6l of 

the appendix. The treatment of Si uses the facts b^ ~ N^ 1 and that /i + ^(l — a) + < 
which is always true when a < 1 and which is a consequence of a < 1 + /i -1 when a > 1. 



3.2. Case where 1 + ^ < a < 2(1 + /j,- 1 ). We have seen at (EJ that 6^ ~ A^. So 
to apply Corollary 15.51 for c n = 6jv, we have to find a cut-off exponent 7 satisfying both 
following constraints: 

1) for k defined by (Tl5|) . there is e > such that with exponentially high probability, 
(17) S:= Y kil<^^ 

j;N-y<\a,ij\<b% 

2) there is e' > such that with probability tending to one, we have: 

\\A N \\ < N^' 6 ' (with A N : = [aijl| aiJ |<V7]i<ij<7v). 



By Corollary 12.21 the second condition is satisfied when max{^,^ + 7} < As 
a < 2(1 + yU -1 ), we have ^ < so for condition 2) to be verified, one only needs 

1 + a a 

7 < 7- 

a 4 
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To treat the sum S of (fT7|) . one proceeds as we did to treat the sum S defined at (fl6|) in 
the case a < 1 + except that now, 

Sx= i a yi 

iVr<|ai J -|<iVa - '' 

and S\ is treated thanks to Part b) of Proposition 15.61 Indeed, Part b) of Proposition 15.61 
implies that w.e.h.p., Si < jV M_7 ( Q: ~ 1 ) +r? with 77 > as small as we need. Hence to fulfill 
Condition 1), one needs 7 to satisfy 

//-7(a- 1) < , 

a 

i.e. 7 > q — /^js ■ To sum up, we need to find a cut-off exponent 7 such that 



H 1 1 + /x /i 

< 7 < 



a a(a — 1) a 4 

Hence to conclude, it suffices to remark that a < 2(1 + ^T 1 ) implies 

( 18 ) --^ J -^< 1± ^-7- 

a a{a — 1) a 4 

4. Proof of Theorem 11.51 

4.1. Eigenvalues. Let us first prove the part about the eigenvalues, i.e. Equation (J7J). 
First, by Theorem II .S\ for any fixed k > 1, we have 

liminf — rr > 2. 

Let us now prove that 

(19) limsup^<2. 

N 'a 

To do that, we will prove that one can find a cut-off exponent 7 such that for A at : = 

[aijl\ aij \<N-y)i<i,j<N, we have 

(20) limsup^-^<2 and || A N - A N \\ = o(N%). 

To treat the first part of (1201) . we apply Corollary 12.21 with 7' = § and 7" > such that 

- +7 + 7 < 7- 

For such a 7" to exist, the constraint on 7 is that 7 < |. To treat the second part of ( 12 Op . 
we use the following claim and the fact that 



(21) ||v4at — An\\ = sup \\\ < \\An — AnWioo^ioo < max 



I a ij I 



A eig. of A N -A N j;\<Hj\>Nl 



HEAVY TAILED BAND MATRICES 13 

Claim 4.1. Under the hypothesis that a > 2(1 + fi^ 1 ), for any 7 > 2 {a-i) > ^ ere is r] > 
such that with probability tending to one, we have: 

max \oij\ < Ni~ v . 

j;\a l3 \>Ny 

Let us conclude the proof of the eigenvalues part of Theorem 11.51 before proving the 
claim. All we need is to find a cut-off exponent 7 such that 

a a 
< 7 < -. 



2(a-l) 4 

The existence of such a 7 is equivalent to the fact that a — 1 > 2, which is true because 
a > 2(1 + > 4. 

Proof of the claim. Let S(i) be the sum in the statement. By ((21), it is easy to see that 
for any 9 > with probability tending to one, we have 

max I aij I < N e . 

Hence by Part a) of Claim [37X1 (using the fact that ~ N~^), for such a 9, with probability 
tending to one, for all i, 

S(i)< l a *il- 

j;JVT<|ay|<JVe 

Using parts b), c), d) of Proposition 15.61 and cutting the sum in three pieces, it is easy to 
see that for any > max{/i — 7(0; — 1), 9}, we have, w.e.h.p., uniformly on i, 

j;N-y<\a tJ \<N<> 

Now, to conclude the proof of the claim, it suffices to notice that the hypotheses 7 > 2 (a-i) 

and a > 2(1 + are respectively equivalent to fi — 7 (a — 1) < ^ and < |, so that 
one can find some exponents 9, <fi satisfying 

— — — < 9 and max{/i — 7(0; — 1), 9} < (f> < — . 
a 2 

□ 



4.2. Eigenvectors. We shall first prove the following lemma. Let us recall that a principal 
submatrix of a matrix H = [xjj]i<jj<jv is a matrix of the type H = [xj k j t ]i<k,e<L) where 
1 < L < N and 1 < ji < ■ ■ ■ < ji < N. The submatrix will be said to be successively 
extracted if the indices ji, ■ ■ ■ ,ji form an interval of the set {1, . . . , N} endowed with the 
cyclic order. 
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Lemma 4.2. Let H be a Hermitian matrix and Pl{H) (resp. pff" cc (H)) be the maximum 
spectral radius of its L x L principal (reps, principal successively extracted) submatrices. 
Let A be an eigenvalue of H and v an associated unit eigenvector. 

J/v is (L, rj) -localized, then |A| < p£(x ^ p(x) . 

If 'v is (L, 7])- successively localized, then |A| < — — ^=p^^. 

Proof. Let ji < ■ ■ ■ < ji be indices such that J^ =1 \vj e \ 2 > 1—rj and let P be the orthogonal 
projection onto the subspace generated by the vectors e^, . . . , e jL (the a,-'s are the vectors 
of the canonical basis). We have 

APv = PPv = PHPv + PH(l - P)v. 

Then the conclusion follows directly from the following 

|A| x ^1-7] < |A| x ||Pv|| < p(PHP) + p(H)\\(l - P)v|| < p(PHP) + y/rjp(H). 

□ 

Claim 4.3. Let us suppose that a > 2(1 + p^ 1 ). 

a) Let us fix c such that 

, , 2 a-2 

22 c < -// -. 

5 a — 1 

P/ien t/iere zs £ > such that w.e.h.p., the following holds: 

For any [N c \ x |_^ C J principal submatrix B of A^, ||P|| < N^~ £ . 

b) Let us fix c such that 

(23) c< p. 

Then there is e > such that w.e.h.p., the following holds: 

For any \_N C \ x |_-^ C J successively extracted principal submatrix B of An, 
we have \\B\\ < N?~ £ . 



Before proving the claim, let us conclude to the proof of the eigenvectors part of Theorem 
11.51 We know that p{A) ~ 2iV"2 and that there is e > such that with probability tending 
to one, Pl(A) (resp. p s ^ cc (A)) is bounded from above by N2~ e . Since ^75= < ~jfb^ < v2 ; 
Lemma [4.21 allows to conclude. 

Proof of the claim. We shall treat a) and b) in the same time. Let us first note that 
Equation (|2"2~|) (resp. Equation (|2"3"|) ) is equivalent to (resp. implies that) 

- + — r + c < - (resp. - + — — - — r < -). 
4 2(a-l) 2 v y 4 2(a-l) 2 J 
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Hence one can choose some positive exponents £,7, 7', 7" such that 7' > c/2 and 

( 24 ) 7 > „, ^ 1V 7" > c (resp. 7" > 0) and - + 7 + 7" < 7' < ^ - e. 

z[a — - 1 J 4 z 

Any submatrix 5 = [^jJi<fc^<u\M can be written B = B + (B — B), with 5 : = 
[ a jkjA\a jkje \<Ni]k,£- We know (see e.g. f l2~Tj) ) that independently of the choice of the jVs 



\B — B\\ < max 

l<i<N ' 

j s.t. Kj|>iVT 



Hence by Claim I4.1[ the condition 7 > 2 ( a -i) °f Equation (T24"!) ensures us that for a 
certain 77 > 0, with probability tending to one, independently of the choice of the jVs 
\\B - B\\ < Nz*. Hence one can focus on B. 

Let us now apply Corollary 12. 21 and Remark [2^41 We get that for any choice of j±, . . . , ][n c \ , 
up to a polynomial factor in the RHT, 

P(||B|| > N-- £ ) < N-^-^W"). 

But there are at most N N ° (resp. N) ways to choose the indices ji, . . . ,j[N c \ of the rows 
of the submatrix B (resp. of the successively extracted submatrix B). Hence the proba- 
bility that ||.B|| > Nz~ 6 for at least one of these choices is < AM^ -6-7 ') 2 ^ 7 J +Arc (resp. 
< ]V-(f- £ -7')2L^" Since by y/ > c ( resp _ y/ > ) and f - £ - 7' > 0, the 

conclusion follows. □ 



5. Appendix 

5.1. Eigenvalues and eigenvectors under perturbation. In this section, we state a 
result about eigenvectors and eigenvalues of perturbed Hermitian matrices. The part about 
eigenvalues can be found in the literature (see the books by Bhatia [31 H]), but we did not 
find the part about the eigenvectors in the literature. 

Proposition 5.1. Let H be a Hermitian matrix and v be a unit vector such that for a 
certain A G R, 

Hw = Av + ew, 
with w a unit vector such that w _L v and e > 0. 

a) Then H has an eigenvalue \ £ in the ball B(X,e). 

b) Suppose moreover that H has only one eigenvalue (counted with multiplicity) in 
B(X,e) and that all other eigenvalues are at distance at least d > e of X. Then for 
v e a unit eigenvector associated to X £! we have 

_ . ... 2e 
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where P v denotes the orthogonal projection onto Span(v). 

Proof. Part b) is a simple consequence of perturbation theory (see e.g. Lemma A. 2 in |18j). 
Let v £ be a normalizes eigenvector associated to A e . We decompose v e = (v e , v)v + r with 
r _L v. Then H\ £ = (v e , v)(Av + ew) + Hr. From this we deduce that 

(X-H)r= (v £ ,v)ew+(A-A £ )v £ . 

This yields the resut, by considering the norm of (X — H) restricted to the subspace v . □ 

5.2. Largest eigenvalues vs largest entries of matrices. In this section, we present 
a synthetic version of some ideas first appeared in Soshnikov's paper [26J. We also extend 
these ideas to the eigenvectors level. Theorem 15.31 below gives a sufficient condition for 
a large deterministic Hermitian matrix to have its kth largest eigenvalue approximately 
equal to its kth largest entry in absolute value for all fixed k. Note that this is not what 
happens usually: in some way the large entries need to overwhelm the other entries. 
We also give sufficient condition so that the corresponding eigenvector is approximately 
equal to the eigenvector of the symmetric matrix formed by forcing all but this kth largest 
entry to be 0. The sufficient condition is, roughly speaking, that the largest entries and 
their spacings have an order c n 3> 1, are sufficiently well spread out in the matrix and that, 
up to the removing of these largest entries, the sum of the terms of each row of the matrix 
have order C c n . In Corollary 15. 4[ we give the random matrix version of this theorem and 
in Corollary I5.5[ we explain how one is allowed to first remove a part of the matrix which 
does not affect the largest entries. 

For each n, let H n be an n x n deterministic Hermitian matrix with entries hij. Let us 
denote by Ai > A2 > • ■ • the eigenvalues of H n (they depend implicitly on n) and let us 
choose some unit associated eigenvectors 

vi, v 2 , . . . 

Let us also introduce a set of pairs of indices {i\ < ji), (i 2 < J2), • • • such that for all k, \hi k j k \ 
is the kth largest entry, in absolute value, of H n . Let ^eK such that h ik j k = \hi k j k \e 2t0k 
The eigenvectors Vi,V2,. . . are chosen such that for each k, e~ t9k (v^, &i k ) > 0. We make 
the following hypotheses. 

Hypothesis 5.2. (a) There is a sequence c n — > +00 such that for any fixed k, 
(a.i) c n+k ~ c n , 

(a.ii) < liminf < limsup < 00 and liminf > 

(b) There exists three exponents k,t,is G (0, 1) such that for n large enough, 
(b.i) no row of H n has two entries larger, in absolute value, than c^, 
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(b.ii) no diagonal entry of H n is larger, in absolute value, than c T n , 

(b.iii) for each i G {1, . . . , n}, 

r,\hj\«% 

Theorem 5.3. Under Hypothesis 15. 2\ as n — > oo ; for any k > 1 fixed, 



A fc , . e i6k e ik + e~ idk 



and v fc — — — — > (for the £ -norm). 



\hikik\ V% 

Before proving the theorem, let us state its two "random versions" . 

Corollary 5.4. Suppose now that the matrix H n is random, that the sequence b n is deter- 
ministic and satisfies Hypothesis (a.i), replace Hypothesis (a.ii) by 

(25) lim lim sup P(!^I < e ) + P (J^*1 > I) + p( I^J ~ \K ±1 i h± A <£ ^ = Q 

e ~>0 n —>co C n C n S C n 

and suppose that Hypothesis (b) holds with probability tending to one. Then the conclusions 
of the theorem remain true for the convergence in probability. 

Proof. Recall that a sequence of real random variables converges in probability to a deter- 
ministic limit if and only if from each of its subsequences, one can extract a subsequence 
converging almost surely. Hence it suffices to notice that the deterministic theorem also 
holds (obviously) if one replaces the sequence H n of n x n matrices, by a sequence H^n) 
of <^(n) x ip{n) matrices with tp(n) — > +oo. □ 

By Proposition I5.1[ one directly deduces the following corollary. 

Corollary 5.5. Suppose that one can write H n = H n + (H n — H n ) , where H n — H n satisfies 
the hypotheses of Corollary \5.4\ and that for a certain p < 1, 

UJJ II 

(26) — ^— converges in probability to zero. 

Cn 

Then the conclusions of Theorem \5.tA for H n remain true for the convergence in probability. 

Proof of Theorem \5.3[ We suppress the dependence on n to simplify notation. 

Fact 1 : We have ||i/||£oo^£oo = |/ij i:)1 |(l+o(l)) (and as a consequence, Ai < 1/1^1(1+0(1))). 

Indeed, ||if ||,px>_^oc, = max, \hij\ > {h^l, thus to prove Fact 1, it suffices to notice 
that for n large enough, for all i, 

} y \hjj\ < 22 l^vl +niax|/iy| < + max|/iy| < c n +|^njil- 

j i;\hii\<c% <h ilil \ by (a.ii). 

Fact 2 : For any fixed k > 1, \ k < \h ikjk \(l + o(l)). 
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Indeed, the hypotheses ensure that for n large enough, the numbers i±, . . . ,ik are pairwise 
distinct, so that the largest entry, in absolute value, of the (n—k+1) x (n—k+1) matrix H^ k \ 
deduced from H by removing rows and columns with indices . . . , ik-i, is \hi k j k \. This 
matrix (more specifically : this sequence of matrices, because n is an implicit parameter 

here) also satisfies the previous hypotheses (for the sequence Cn := c n+ k-i)- Hence by the 
previous fact, Xi(H^) < \h ik j k \(l + o(l)). But by Weyl's interlacing inequalities, we have 
X k (H) < Ai(if (fc) ). It allows to conclude. 



Fact 3 : For any fixed k > 1, for v 



e %u k e, 



a ke 



■>k 



V2 



we have 



Hv 



with 



0{C r , 



Indeed, for r := Hv — |/V.,-Jv, it is easy to see that 



r < 



We have ||r|| = o(c n ) because by Hypothesis (b), \h ikik \ + \hj k j k \ < 2c* and 



Let us now conclude the proof of the theorem. Since \h ik j k \ has order c„, Fact 3 and 
Part a) of Proposition 15.11 imply that for any fixed k > 1, H has an eigenvalue equal 
to \h ik j k \(l + o(l)). Hence by Fact 2 and Hypothesis (a.ii), = \h ik j k \{l + o(l)). By 
Hypothesis (a.ii) again, it follows that \\f. — \k+i\ has order c n and so one can apply Part 
b) of Proposition 15.11 to deduce from fact 3 that 



Vfc 



+ e 
V2 



-i8k e 



3k 



0. 



□ 



5.3. Sums of truncated heavy-tailed random variables. In this section, we give 
exponential estimates on the concentration of sums of truncated heavy-tailed variables. In 
the paper, these estimates are needed for example to give upper bounds on the spectral 
radius of matrices via the maximum of the sums of the entries along the rows. 

Let us consider some i.i.d. variables Yi > such that for a certain a > 0, 
(27) P(yi > y) ~ y~ a as y oo. 

Let us also fix a sequence d n ~ n M for a fixed /x > 0. 
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Proposition 5.6. a) For any sequence j3 n ~ n b with < b < fi/a and any e > 0, we 
have w.e.h.p. 

i=i 

b) If a > 1, for any sequences a n ~ n a , (3 n ~ n b with 0<a<b<fi/a and any e > 0, 
we have w.e.h.p. 

3=1 

c) For any sequences a n ~ n^~ n , (3 n ~ n« +r? ' ; with r], rj' > and any e > ai] + rj' , we 
have w.e.h.p. 

dn 

d) For any sequences a n ~ n^ +r] with r\ > 0, f3 n ~ n 13 , with (3 > and any 7 > j3, we 
have w.e.h.p. 

d n 

3=1 



Before proving the proposition, we shall first establish the following concentration result 
for sums of Bernoulli variables. 

Lemma 5.7. For each n > 1, let X±, . . . , X m be some independent Bernoulli variables with 
paramater p (m, the Xi's and p depending on the parameter n). Suppose that mp > Cn e 
for some constants C,9 > 0. Then for any fixed r\ > 0, we have w.e.h.p. 

(28) 



1 m 



V 



< VP- 



Proof. Set S := Y^Li-X-i- Let us for example give an upper-bound on F(S > mp{l + rf)) 
(the other side can treated in the same way with A < 0). Using the fact that for any A > 0, 

logE[e AX! ] = log(l +p(e x - 1)) < p(e x - 1), 

we have, for any A > 0, 

¥(S > mpil+i])) < E[e xs }e~ Xmp{1+ri) < e -™p{Hi+r))-(e x -i)} _ 

We conclude remarking that max A > {A(l + — (e A — 1)} = (1 + rj) log(l +77) — r\ > 0. □ 
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Proof of Proposition UTSi a) First, one gets rid of the j's such that Yj < 1 because their 
sum is < oL. Then, set 



S n :- 7j*jli<y 7 -</3n 



J'=l 



and fc e := |_6/erJ . We have := M ^ fce£ > and for n large enough, /3„ < n,( fc s+ 1 ) £ ; so 
fc=o i=i fc=o i=i 



(n) 



For each fc, is a sum of d n ~ n M independent Bernoulli variables with parameter 

Pk( n ) ~ n~ ake . We have d n pk(n) ~ n ^~ ak ^ hence for n large enough, d n pk{n) > n e (where 
6 is defined above). As a consequence, by Lemma [5.71 w.e.h.p., for each k, 

4 n) < 2d nPk (n). 

This implies that 

,{k+l)e < a+b(l-a)++e 



S n <J2 2 dnPk(nW k+1 ^<n^ 



k=0 

b) Let S n be the considered sum. The proof works in the same way as the one of a) 

2 



introducing k £ := \_(b — a)/ej, 9 := - a( - a + ks£ > > Q and writing 



S71 < lq n n fce <y,<q n n( fc + 1 ) EQ: « T2< ' A '' +1 ' >£ - 

fc=o i=i 

c) This is a direct application of Lemma 15.7} since the considered sum if < j3 n x the sum 
of dtf Bernoulli variables with parameter ~ 

d) Note that if the considered sum is > n 7 , then there are at least n 1 / '(3 n non zero terms 
in the sum. By the union bound, this happens with probability at most 

d \nVM x P (y 1>aB )r»V*l 

(indeed, there are at most d\? subsets of cardinality |~n 7 //3 n ] in {1, . . . , d n }). Using 
f l27|) . one can easily check that this probability is exponentially small. □ 
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